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Abstract 

We consider the possibility of using neural networks in experimental data analysis 
in Daphne. We analyze the process 77 7r"'"7r^7r'^ and its backgrounds using neural 
networks and we compare their performances with traditional methods of applying 
cuts on several kinematical variables. We find that the neural networks are more 
efficient and can be of great help for processes with small number of produced events. 
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1 Introduction 



The Daphne ^-factory should start to operate very soon in Frascati. Its main goals are 
the study of $ decays and related processes, mainly studies of CP violation in kaon decays, 
n-TT phase shifts, r] decays, etc. Being an e"*" e~ collider, the machine is also suited 
to study 77 physics. In this context, the golden plate process at Daphne is 77 7r°7r°. 
The main reason is that the present experimental situation is not well established, at 
least at the region near threshold, where good theoretical predictions exist in the context 
of Chiral Perturbation Theory (ChPT). Moreover, this theoretical predictions start at 
the one-loop level, and thus this process is a clear test of the effective quantum field 
theory character of ChPT. In a similar way, other interesting processes have been proposed 
recently. In particular, the processes 77 3tt are also interesting because one-loop 
predictions dominate over tree-level ones p|. They differ however from 77 vr^vr^ in 
several aspects: i) They are anomalous processes, ii) they are not exclusive test of chiral 
loops since they get contributions from counterterms and iii) their cross sections are much 
smaller, thus more difficult to measure experimentally. The best way consists in tagging 
the electron and positron, but at expenses of reducing significantly the number of events 
due to small tagging efficiencies It is therefore convenient to dispose of alternative 
methods with large efficiency and without lepton tagging whenever possible. We suggest 
that neural networks (NN's) could be used in experimental analysis for such a purpose. 
We have trained a NN with 77 tt^tt^tt^ (signal) and have considered the three main 
sources of background. Our analysis avoids tagging (thus we are not penalized by small 
tagging efficiencies) and obtains results which are better than traditional methods based 
in applying cuts over a set of kinematical variables. 

This work is organized as follows. In section 2 we describe briefly ChPT and its predic- 
tions for 77 —>■ Stt at Daphne. Section 3 gives a short description of NN's. In Section 4 we 
describe the generation of data for the signal and the analyzed backgrounds and introduce 
the set of kinematical variables which are used as the NN inputs. The performance of the 
NN is compared with the usual methods of analysis in Section 5. Section 6 is devoted to 
the conclusions. 

2 Chiral Perturbation Theory for 77 — > Stt 

ChPT is an effective formulation of QCD at low energy in terms of pseudoscalar mesons as 
fundamental fields[0. It is inspired from QCD enforcing its symmetry properties. Indeed, 
the QCD Lagrangian -in terms of quarks and gluons- possesses a Chiral SU{3)l x SU{3)r 
symmetry, for massless quarks. However, when considering the quark mass terms, these 
break the chiral symmetry. In the effective low energy version of QCD, one replaces the 
fundamental quark and gluon fields by the pseudoscalar mesons, imposing SU (3) l x SU (3)/? 
symmetry, only broken by terms proportional to the quark masses, which can be related 
to the pseudoscalar meson masses. The ChPT Lagrangian can be written as an expansion 
in momenta and masses and treated perturbatively. The first order ChPT predictions 
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are essentially equivalent to Current Algebra. However, they get corrections from higher 
order terms, playing loops a particular role. They are essential in order to incorporate the 
correct analyticity, unitarity and crossing symmetry properties of the physical amplitudes. 
Moreover, loops give -in general- divergences which can only be absorbed by tree-level 
counterterms present in the higher order Lagrangian. The number of needed counterterms 
depends on the order of the momentum expansion. This is a consequence of the non 
renormalizability of the theory (in the classical sense that all divergences can be absorbed 
in a fixed, finite, number of terms). In spite of that, the theory can still give predictions 
provided one can fix the values of the counterterms through related processes. When 
this is not possible, one has to rely on some phenomenological models to estimate those 
counterterms, but at expenses of introducing model dependence in the game. Electroweak 
interactions can be introduced in a systematic and selfconsistent way. 

One distinguishes two sectors in ChPT. The normal, even intrinsic parity sector, treats 
processes as, for instance, vrvr — *• tttt, rj — > Svr, or 77 nn. The anomalous, odd intrinsic 
parity sector, accounts for processes as 7r° 77, t] nn'y, and 77 — >• Svr. From the 
former sector, the 77 —* tt^ti^ process plays an important role. This is because there is 
no tree level contribution 0|. The first non vanishing contribution starts at 0(p'^) -in the 
momentum expansion- and is entirely given by one loop diagrams, with no contribution 
from the O(p^) tree level Lagrangian. This makes the process very interesting, since it tests 
the loop predictions of the theory. In a similar way, the anomalous processes 77 Svr 
receive contributions from 0{p^) which dominate over the non vanishing tree level ones 
and test the loop predictions in the anomalous sector, although in a less severe way, since 
there are two types of 0{p^) contributions, loops and counterterms. 

In Refs.p, ^ the amplitudes for 77 — »• 7r+7r~7r° and 77 —>■ 37r° have been obtained with 
the corresponding predictions for the expected number of events at Daphne. One expects 
around 180 (23) events per year for the first (second) process. These are quite moderate 
number of events, and require good strategies for its eventual experimental detection. We 
suggest that using NN's can be a good possibility to perform an efficient analysis. We 
restrict our analysis to the first, charged channel, which looks a priori more promising that 
the neutral one. 



3 Neural Networks 

Neural Networks (NN's) are useful tools for pattern recognition. In high energy physics, 
they have been used or proposed as good candidates for tasks of signal versus background 
classification. Some examples are the Higgs searches [|^, b and r analysis quark and 
gluon jets analysis [^], determination of Z to heavy quarks branching ratios ||10|, bottom-jet 



recognition |TT| and top-quark search in pp colliders [11^ O]. Recently, NN's have been 



used for experimental top quark searches at the Tevatron||14]|. 

We have considered layered feed-forward NN's with topologies Ni x Nh^ x Nh^ x No, 
where Ni (No) are the number of input (output) neurons and N^^^N^^ are the neurons in 
two hidden layers. 
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The input of neuron i in layer / is given by, 



t\ 1 = 1; ll = Y.<^S\r^ + Bl / = 2,4, (1) 

j 

where irif^ is the set of kinematical variables describing a physical event e, the sum is 
extended over the neurons of the preceding layer (/ — 1), is the state of the neuron j, 
w\j is the connection weight between the neuron j and the neuron and B\ is a bias input 
to neuron i. The state of a neuron is a function of its input = F{Ij), where F is the 
neuron response function. In this study the "sigmoid function", F{Ij) = 1/ (1 + exp(— /])), 
has been chosen. This function offers a more sensitive modeling of real data than a linear 
one. 

Back-propagation was used as the learning algorithm. Its main objective is to minimize 
the quadratic output-error E, 

E = E{inf\ ou&\wki, B,) = \ ^(o^'^^ - out'^'^f . (2) 

^ e 

This minimization is obtained by adjusting the w^i and B^ parameters, where o'-"^ is the 
state of the output neuron for event e, out^^'' is its desired state, and e runs over the 
learning sample. Taking the desired output as 1 for signal events and for background 
events, the network output gives, after training, the conditional probability that new test 
events presented to the network are of signal- or background-type |jl5|, provided that the 



signal/background ratio used in the learning phase corresponds to the real one. 

Weights are updated for each event presented to the NN during the learning phase. 
Once the quadratic error E reaches its minimum value, they are kept fixed and used 
in the testing phase where the NN is used as a signal-background classifier. A frequent 
problem encountered in NN training is over-learning. It takes place when the NN interprets 
statistical fluctuations as real differences. In this study over-learning is avoided by checking 
the evolution of the error on a test sample, Et, and stop learning when Et starts to increase, 
even if the learning error function E still continues to decrease. 



4 Data generation for signal and backgrounds 

We take as signal in our analysis the process 77 h^h^tt^ as predicted by ChPT at 0{p^) 
IP in Daphne, running at e~^e~ center of mass energies ^/s = M$. We avoid tagging of 
the leptons for its analysis, in order to keep all produced events. In so doing, we had to 
consider several types of backgrounds. We analyzed the following ones: 

Bl) 77 — s> ^ 7T^7[~7[^ 

B2) e+e" ^ cj, $ ^ 7r+7r"7r°7 
B3) e+e" (u, $)7 7r+7r"7r°7 
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The first background Bl is eta pliotoproduction. It lias the same origin as the signal 
and differs from it because the invariant mass of the three pion system is strongly peaked 
around the eta mass. The background B2 consists in the decay of a real (virtual) $ 
[uj) vector meson which decays into 7r+7r^7r°7. The background B3 accounts for virtual 
production of $ or vector mesons and one initial state bremsstrahlung photon. In the last 
two processes, we demand the presence of one undetected photon. This photon decreases 
the available energy of the three-pion system and eliminates the production and decay of 
a virtual u into Sir as potential background. The photon escapes detection mainly going 
through the beam pipe. 

As it has been previously mentioned, the signal has been predicted in the context of 
ChPT. The first background has been estimated in the same context, but using a constant 
matrix element evaluated at the center of the Dalitz plot. This is a good approximation 
for our purpose of estimating the range of energies where this background is important. 
The backgrounds B2 and B3 have been computed using vector meson dominance. 

We have generated Monte Carlo events for the signal and the backgrounds, satisfying 
the following generation cuts: 

1) All pions are in the detector, which has almost full An acceptance except for the 
beam pipe, which corresponds to a fraction of 2% of the total solid angle. 

2) The invariant mass of the three pion state is restricted to be in the range 3m^ < 
'^Stt < 0.7 GeV. The upper limit is conservatively taken in such a way that the ChPT 
matrix elements used for the signal and the background Bl, computed at O(p^), can be 
trusted. For larger invariant masses, one expects that higher order corrections could be 
important and modify significantly the estimation of the signal. 

3) The photons produced in backgrounds B2 and B3, escape detection through the 
beam pipe. On the contrary, they should be easily detected since their energy is forced to 
be in the range 255 MeV < < 341 MeV due to the above constraint imposed on m^.,^. 

4) As we are interested in a final 37r state not coming from t] production, we make an 
additional cut on m^.,^. We demand + A < m^.,^ < tjIj^ — A. Taking A = 20 MeV, the 
first background is practically eliminated , since it is only important around the rj mass 
region. 

The number of expected events per year passing the generation cuts for the Daphne 
integrated luminosity of / Ldt = 5 x 10^ nb^^ are: 71, 0.07, 1714, 776 for the signal and 
each of the backgrounds, respectively. The background Bl can be safely discarded. 

For the analysis and as inputs to the NN, we chose the following kinematical variables 

• 1) The 77+ transverse momentum, 

• 2) the TT" transverse momentum, 

• 3) the 7r° transverse momentum, 

• 4) the three pion system transverse energy, 

• 5) the 7r+ pseudorapidity. 
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• 6) the 7r~ pseudorapidity, 

• 7) the 7r° pseudorapidity, 

• 8) the three pion system sphericity in the three pion center of mass, 

• 9) the difference between rup and its best approximation by the invariant masses of 
all possible pairs of pions. 

All the above variables are self explained, except the last one which was chosen be- 
cause background B3 is mediated by a p exchange with its subsequent decay into a pion 
pair. There is no need to say that one could have considered other variables, as angular 
correlations among the final pions, for example, which bring additional information on the 
physics of the process and could help in the task of signal versus background separation. 

5 Results 

Rather than using the expected number of events produced at Daphne, we generated bigger 
samples of 10000 signal, B2 and B3 background events, passing the generation cuts. Prom 
each of those, 8000 events were used to train a {Ni = 9) x {N^^ = 11) x {Nh^ = 5) x {Ng = 
1) NN -denoted by NN9 from now on- to give output 1 for the signal and for the 
backgrounds. The rest of events were reserved for doing the NN test and the analysis in a 
classical way. (The Bl background is eliminated by the generation cut 4.) The obtained 
results were rescaled to the expected number of events produced at Daphne per year. In 
Fig. 1 we show the distribution of the test events that survive as a function of the NN9 
output cut. (An event survives if its corresponding output is larger than the chosen output 
cut.) The dot-dashed (solid) line corresponds to the signal (total background) events. One 
can deduce that the signal events are very peaked to output values very close to one, while 
the background events tend to concentrate at values close to zero. It is clear that one can 
select subsamples richer on signal or background with suitable choices of NN output cuts. 
In our case, we are interested in improving the signal to background ratio, thus we will 
accept events with outputs larger than a given output cut. A good variable to parametrize 
the efficiency of the analysis is the statistical significance, defined as Sg = Ng/ \fNb-, being 
Ng {Nb) the number of signal (background) accepted events. The sohd line in Fig. 2 shows 
the statistical significance Sg as a function of the NN9 output cut. The curve has been 
plotted for output cut values up to 0.95, to avoid strong fiuctuations on its estimation 
due to lack of statistics. For output cuts around 0.9, the achievable Sg is around 60, thus 
indicating that the NN performs a very good job in the signal recognition against the 
considered backgrounds. 

At this point, we would like to stress the benefits of using the NN over more traditional 
methods of doing the experimental analysis. Indeed, usually experimentalists perform sev- 
eral cuts on some kinematical variables to isolate the regions where the signal differs most 
from the backgrounds. This procedure, when one considers a large number of variables, is 
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usually done by means of linear cuts, isolating hypercubical regions in favor of the signal. 
Its efficiency is known to be lower than the achievable by NN techniques [|12|. One can 



wonder, however, how the NN results compare with smarter ways of applying cuts. In par- 
ticular, it is clear that one has to isolate regions of the parameter space with complicated 
geometry, so linear cuts will have limited success in general. One has to consider non linear 
cuts specially designed by previous inspection of the signal and background. This is only 
feasible, in practice, for small number of variables in the analysis. We performed an anal- 
ysis in these terms using the three most significant variables. These were obtained from 
the nine original variables using the methods discussed in Ref. involving the weights 
connecting the inputs with the first hidden layer. They turned to be the pseudorapidities 
of the pions. The topology of the signal and background events in the three pseudorapidi- 
ties space look qualitatively different. The signal tends to lie inside an ellipsoidal surface 
centered at the origin, while the background events are preferably distributed into two 
separated regions, symmetrically located respect to the origin. 

We could isolate non linear regions with statistical significances up to 10. Notice that 
this result must not be compared with the results of the NN9, which were obtained using 
the full set of the original 9 variables. In order to do a fair comparison, we trained a 
smaller NN using the same three input variables, which we denote by NN3, with topology 
(Ni = 3) X {Nh, = 5) X {Nh2 = 5) X {N^ = 1). The results of this NN3 net are also shown 
in the figures. The dotted (dashed) line in Fig. 1 shows the accumulated number of signal 
(background) events as a function of the NN3 output cut. The reconstruction of the signal 
is fairly good, but the background is much worse respect to the NN9. This translates 
into much smaller statistical significances, typically by a factor of 6, as it is shown by the 
dashed fine of Fig. 2, where Ss is plotted as a function of the NN3 output cut. 

Two comments are in order. First, notice that we were interested in reducing the 
variables to three, to be able to design good sets of non linear cuts with the help of 
three dimensional distributions of events. In case of keeping more variables, the statistical 
significances would not be so drastically reduced. Second, the reduced NN3, for output 
cuts larger than 0.85, is at least as efficient as the best optimized non linear classical cuts 
we could find. Moreover, there is a great advantage of the NN3 in front of the non linear 
cuts: Whereas the latter have to be designed by visual inspection and require dedicated 
work, the NN operates in a completely automatic way, with comparable efficiency. This 
is not surprising. It is due, in fact, to the highly non linear behaviour of the NN's, which 
allows them to select complicated regions of the parameter space in an automatic and 
painless way. 

Finally, NN's can be easily trained for any number of input variables. On the contrary, 
non-linear classical analyses are strongly limited to small number of variables. This makes 
NN's very useful tools for processes where high efficiencies are needed. 
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6 Conclusions 



We have considered the abihty of NN's to perform experimental analysis for the process 
77 7r"'"7r~7r° at Daphne. The ChPT prediction for the number of events produced is 
relatively small, thus indicating that efficient methods for its detection and analysis can 
be of great help. We have considered three types of backgrounds which mimic the signal 
and we have avoided tagging of the initial leptons which would imply a sensible reduction 
of event statistics. Using a set of nine kinematical variables as inputs of a NN9, we have 
obtained large statistical significances for a wide interval of output cuts. 

We have also studied the expected efficiencies for a smaller NN3, using the three pion 
pseudorapidities as inputs, and compared them with the efficiencies found by using classical 
analyses in terms of non linear cuts for the same variables. We have found that the NN3 
statistical significances, obtained in an automatic and painless way, are at least as good as 
the best result we could find using non linear cuts chosen through accurate inspection on 
the distribution of the signal and background events in the three variable space. This is 
due to the highly non linear behaviour of the NN, which isolates the phase space regions 
where the signal differs significantly from the background. However, the NN3 efficiency is 
much smaller than the one obtained by NN9. It is therefore highly recommended to use 
large sets of kinematical variables for ensuring large efficiencies. This represents no extra 
effort for the NN's and can be a great challenge for classical methods. We finally stress 
that the usefulness of NN's is not restricted to the signal analysed, but it can be shown to 
work similarly for any other process of interest. 
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Figure 1: Accumulated number of events with output larger than a given output cut. Solid 
(dashed) and dot-dashed (dotted) lines correspond to the background and signal results, 
respectively, for NN9 (NN3). 
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Figure 2: Statistical significance as a function of the NN output cut. Solid (dashed) line 
corresponds to the NN9 (NN3) net. 
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